-
Notifications
You must be signed in to change notification settings - Fork 657
[Deterministic] Move paddle version batch invariant pkg to Fastdeploy #4763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
|
Thanks for your contribution! |
|
please format you code |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces batch-invariant implementations of key PaddlePaddle operations (mm, addmm, log_softmax, mean) using Triton kernels to achieve deterministic inference results regardless of batch size. The implementation is adapted from the batch_invariant_ops library and integrated into FastDeploy.
- Adds custom Triton kernel implementations for deterministic matrix operations and reduction operations
- Provides a context manager to toggle between standard and batch-invariant modes
- Includes comprehensive test files demonstrating batch invariance for each operation
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py | Core implementation with Triton kernels for batch-invariant operations and mode switching functionality |
| fastdeploy/model_executor/layers/batch_invariant_ops/init.py | Module initialization exporting public API |
| tests/batch_invariant/test_batch_invariance_op_mm.py | Test suite for matrix multiplication batch invariance |
| tests/batch_invariant/test_batch_invariance_op_mean.py | Test suite for mean operation batch invariance |
| tests/batch_invariant/test_batch_invariance_op_logsoftmax.py | Test suite for log_softmax operation batch invariance |
| tests/batch_invariant/test_batch_invariance_op_addmm.py | Test suite for addmm operation batch invariance |
fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py
Outdated
Show resolved
Hide resolved
fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py
Show resolved
Hide resolved
fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py
Show resolved
Hide resolved
fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py
Show resolved
Hide resolved
fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py
Show resolved
Hide resolved
…ariant_ops.py 存在于原版代码注释中的版本控制遗留的内容,确实应该去除 Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
…ariant_ops.py Co-authored-by: Copilot <[email protected]>
Motivation
Achieving batch invariance in the PaddlePaddle framework.
Batch invariance:https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
想要跑通需要安装如下内容,paddle必须是比较新的(建议用最新的)
如果能看见Batch-Invariant Mode下均为0就代表正确

目前只有log_softmax算子尽管精心构造了输入数据,但是在原版实现似乎就已经具备批处理不变性了。
TODO:严格对齐API目前(mm和log_softmax还存在问题),可以考虑把test case整合进一个文件,文件中列出的若干TODO
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.